Protein and gene model inference based on statistical modeling in k-partite graphs.
نویسندگان
چکیده
One of the major goals of proteomics is the comprehensive and accurate description of a proteome. Shotgun proteomics, the method of choice for the analysis of complex protein mixtures, requires that experimentally observed peptides are mapped back to the proteins they were derived from. This process is also known as protein inference. We present Markovian Inference of Proteins and Gene Models (MIPGEM), a statistical model based on clearly stated assumptions to address the problem of protein and gene model inference for shotgun proteomics data. In particular, we are dealing with dependencies among peptides and proteins using a Markovian assumption on k-partite graphs. We are also addressing the problems of shared peptides and ambiguous proteins by scoring the encoding gene models. Empirical results on two control datasets with synthetic mixtures of proteins and on complex protein samples of Saccharomyces cerevisiae, Drosophila melanogaster, and Arabidopsis thaliana suggest that the results with MIPGEM are competitive with existing tools for protein inference.
منابع مشابه
Unmixed $r$-partite graphs
Unmixed bipartite graphs have been characterized by Ravindra and Villarreal independently. Our aim in this paper is to characterize unmixed $r$-partite graphs under a certain condition, which is a generalization of Villarreal's theorem on bipartite graphs. Also, we give some examples and counterexamples in relevance to this subject.
متن کاملSome Algebraic and Combinatorial Properties of the Complete $T$-Partite Graphs
In this paper, we characterize the shellable complete $t$-partite graphs. We also show for these types of graphs the concepts vertex decomposable, shellable and sequentially Cohen-Macaulay are equivalent. Furthermore, we give a combinatorial condition for the Cohen-Macaulay complete $t$-partite graphs.
متن کاملk-Partite cliques of protein interactions: A novel subgraph topology for functional coherence analysis on PPI networks.
Many studies are aimed at identifying dense clusters/subgraphs from protein-protein interaction (PPI) networks for protein function prediction. However, the prediction performance based on the dense clusters is actually worse than a simple guilt-by-association method using neighbor counting ideas. This indicates that the local topological structures and properties of PPI networks are still open...
متن کاملDetermination of Volumetric Mass Transfer Coefficient in Gas-Solid-Liquid Stirred Vessels Handling High Solids Concentrations: Experiment and Modeling
Rigorous analysis of the determinants of volumetric mass transfer coefficient (kLa) and its accurate forecasting are of vital importance for effectively designing and operating stirred reactors. Majority of the available literature is limited to systems with low solids concentration, while there has always been a need to investigate the gas-liquid hydrodynamics in tanks handling ...
متن کاملMarginal Analysis of A Population-Based Genetic Association Study of Quantitative Traits with Incomplete Longitudinal Data
A common study to investigate gene-environment interaction is designed to be longitudinal and population-based. Data arising from longitudinal association studies often contain missing responses. Naive analysis without taking missingness into account may produce invalid inference, especially when the missing data mechanism depends on the response process. To address this issue in the ana...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Proceedings of the National Academy of Sciences of the United States of America
دوره 107 27 شماره
صفحات -
تاریخ انتشار 2010